Automatically Generating Wikipedia Articles: A Structure-Aware Approach
نویسندگان
چکیده
In this paper, we investigate an approach for creating a comprehensive textual overview of a subject composed of information drawn from the Internet. We use the high-level structure of human-authored texts to automatically induce a domainspecific template for the topic structure of a new overview. The algorithmic innovation of our work is a method to learn topicspecific extractors for content selection jointly for the entire template. We augment the standard perceptron algorithm with a global integer linear programming formulation to optimize both local fit of information into each topic and global coherence across the entire overview. The results of our evaluation confirm the benefits of incorporating structural information into the content selection process.
منابع مشابه
WikiWrite: Generating Wikipedia Articles Automatically
The growth of Wikipedia, limited by the availability of knowledgeable authors, cannot keep pace with the ever increasing requirements and demands of the readers. In this work, we propose WikiWrite, a system capable of generating content for new Wikipedia articles automatically. First, our technique obtains feature representations of entities on Wikipedia. We adapt an existing work on document e...
متن کاملMapping WordNet synsets to Wikipedia articles
Lexical knowledge bases (LKBs), such as WordNet, have been shown to be useful for a range of language processing tasks. Extending these resources is an expensive and time-consuming process. This paper describes an approach to address this problem by automatically generating a mapping from WordNet synsets to Wikipedia articles. A sample of synsets has been manually annotated with article matches...
متن کاملAn Entity-Focused Approach to Generating Company Descriptions
Finding quality descriptions on the web, such as those found in Wikipedia articles, of newer companies can be difficult: search engines show many pages with varying relevance, while multi-document summarization algorithms find it difficult to distinguish between core facts and other information such as news stories. In this paper, we propose an entity-focused, hybrid generation approach to auto...
متن کاملWikitology: a Novel Hybrid Knowledge Base Derived from Wikipedia
Title of dissertation: WIKITOLOGY: A NOVEL HYBRID KNOWLEDGE BASE DERIVED FROM WIKIPEDIA Zareen Saba Syed, Doctor of Philosophy, 2010 Dissertation directed by: Professor Timothy W. Finin Department of Computer Science and Electrical Engineering World knowledge may be available in different forms such as relational databases, triple stores, link graphs, meta-data and free text. Human minds are ca...
متن کاملAdvertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles
When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009